traditional feature
Enhance the machine learning algorithm performance in phishing detection with keyword features
Recently, we can observe a significant increase of the phishing attacks in the Internet. In a typical phishing attack, the attacker sets up a malicious website that looks similar to the legitimate website in order to obtain the end-users' information. This may cause the leakage of the sensitive information and the financial loss for the end-users. To avoid such attacks, the early detection of these websites' URLs is vital and necessary. Previous researchers have proposed many machine learning algorithms to distinguish the phishing URLs from the legitimate ones. In this paper, we would like to enhance these machine learning algorithms from the perspective of feature selection. We propose a novel method to incorporate the keyword features with the traditional features. This method is applied on multiple traditional machine learning algorithms and the experimental results have shown this method is useful and effective. On average, this method can reduce the classification error by 30% for the large dataset. Moreover, its enhancement is more significant for the small dataset. In addition, this method extracts the information from the URL and does not rely on the additional information provided by the third-part service. The best result for the machine learning algorithm using our proposed method has achieved the accuracy of 99.68%.
Multifractal-spectral features enhance classification of anomalous diffusion
Seckler, Henrik, Metzler, Ralf, Kelty-Stephen, Damian G., Mangalam, Madhur
Anomalous diffusion processes pose a unique challenge in classification and characterization. Previously (Mangalam et al., 2023, Physical Review Research 5, 023144), we established a framework for understanding anomalous diffusion using multifractal formalism. The present study delves into the potential of multifractal spectral features for effectively distinguishing anomalous diffusion trajectories from five widely used models: fractional Brownian motion, scaled Brownian motion, continuous time random walk, annealed transient time motion, and L\'evy walk. To accomplish this, we generate extensive datasets comprising $10^6$ trajectories from these five anomalous diffusion models and extract multiple multifractal spectra from each trajectory. Our investigation entails a thorough analysis of neural network performance, encompassing features derived from varying numbers of spectra. Furthermore, we explore the integration of multifractal spectra into traditional feature datasets, enabling us to assess their impact comprehensively. To ensure a statistically meaningful comparison, we categorize features into concept groups and train neural networks using features from each designated group. Notably, several feature groups demonstrate similar levels of accuracy, with the highest performance observed in groups utilizing moving-window characteristics and $p$-variation features. Multifractal spectral features, particularly those derived from three spectra involving different timescales and cutoffs, closely follow, highlighting their robust discriminatory potential. Remarkably, a neural network exclusively trained on features from a single multifractal spectrum exhibits commendable performance, surpassing other feature groups. Our findings underscore the diverse and potent efficacy of multifractal spectral features in enhancing classification of anomalous diffusion.
A Unified Neural Network Model for Readability Assessment with Feature Projection and Length-Balanced Loss
Li, Wenbiao, Wang, Ziyang, Wu, Yunfang
For readability assessment, traditional methods mainly employ machine learning classifiers with hundreds of linguistic features. Although the deep learning model has become the prominent approach for almost all NLP tasks, it is less explored for readability assessment. In this paper, we propose a BERT-based model with feature projection and length-balanced loss (BERT-FP-LBL) for readability assessment. Specially, we present a new difficulty knowledge guided semi-supervised method to extract topic features to complement the traditional linguistic features. From the linguistic features, we employ projection filtering to extract orthogonal features to supplement BERT representations. Furthermore, we design a new length-balanced loss to handle the greatly varying length distribution of data. Our model achieves state-of-the-art performances on two English benchmark datasets and one dataset of Chinese textbooks, and also achieves the near-perfect accuracy of 99\% on one English dataset. Moreover, our proposed model obtains comparable results with human experts in consistency test.
Land Classification in Satellite Images by Injecting Traditional Features to CNN Models
Aksoy, Mehmet Cagri, Sirmacek, Beril, Unsalan, Cem
Deep learning methods have been successfully applied to remote sensing problems for several years. Among these methods, CNN based models have high accuracy in solving the land classification problem using satellite or aerial images. Although these models have high accuracy, this generally comes with large memory size requirements. On the other hand, it is desirable to have small-sized models for applications, such as the ones implemented on unmanned aerial vehicles, with low memory space. Unfortunately, small-sized CNN models do not provide high accuracy as with their large-sized versions. In this study, we propose a novel method to improve the accuracy of CNN models, especially the ones with small size, by injecting traditional features to them. To test the effectiveness of the proposed method, we applied it to the CNN models SqueezeNet, MobileNetV2, ShuffleNetV2, VGG16, and ResNet50V2 having size 0.5 MB to 528 MB. We used the sample mean, gray level co-occurrence matrix features, Hu moments, local binary patterns, histogram of oriented gradients, and color invariants as traditional features for injection. We tested the proposed method on the EuroSAT dataset to perform land classification. Our experimental results show that the proposed method significantly improves the land classification accuracy especially when applied to small-sized CNN models.
Deep Search Query Intent Understanding
Liu, Xiaowei, Guo, Weiwei, Gao, Huiji, Long, Bo
Understanding a user's query intent behind a search is critical for modern search engine success. Accurate query intent prediction allows the search engine to better serve the user's need by rendering results from more relevant categories. This paper aims to provide a comprehensive learning framework for modeling query intent under different stages of a search. We focus on the design for 1) predicting users' intents as they type in queries on-the-fly in typeahead search using character-level models; and 2) accurate word-level intent prediction models for complete queries. Various deep learning components for query text understanding are experimented. Offline evaluation and online A/B test experiments show that the proposed methods are effective in understanding query intent and efficient to scale for online search systems.
Self-driving cars could be built without mirrors and a STEERING WHEEL
Waymo, the self-driving car unit of Google parent Alphabet, is pushing to get rid of many traditional car features, including mirrors, pedals and the steering wheel. It urged the National Highway Traffic Safety Administration (NHTSA) to'promptly' remove regulatory barriers for cars necessitating they have all traditional features. Legislation controlling the manufacture of cars dictates they must meet more than 70 auto safety standards, even if they are redundant. Waymo says a lot of the rules are not vital to self-driving cars because they work and operate in a completely different way to traditional vehicles. Waymo urged the National Highway Traffic Safety Administration (NHTSA) to'promptly' remove regulatory barriers for cars necessitating they have traditional features (file photo) Waymo, as well as Honda, Uber and Lyft, penned a letter to NHTSA asking for progress to be made in refining the rules to help streamline the development of autonomous cars.